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ABSTRACT 

Context. Kinematical data such as the mean velocities and velocity dispersions of stellar samples are useful tools to study galactic 
structure and evolution. However, observational data are often incomplete (e.g., lacking the radial component of the motion) and may 
have significant observational errors. For example, the majority of faint stars observed with Gaia will not have their radial velocities 
measured. 

Aims. Our aim is to fomulate and test a new maximum likelihood approach to estimating the kinematical parameters for a local stellar 
sample when only the transverse velocities are known (from parallaxes and proper motions). 

Methods. Numerical simulations using synthetically generated data as well as real data (based on the Geneva-Copenhagen survey) 
are used to investigate the statistical properties (bias, precision) of the method, and to compare its performance with the much simpler 
"projection method" described by Dehnen & Binney (1998). 

Results. The maximum likelihood method gives more correct estimates of the dispersion when observational errors are important, 
and guarantees a positive-definite dispersion matrix, which is not always obtained with the projection method. Possible extensions 
and improvements of the method are discussed. 

Key words, astrometry - methods: data analysis - methods: numerical 



1. Introduction 

Statistical information about the motions of stars relative to the 
sun may contain important hints concerning the origin and his- 
tory of the stars themselves (e.g., by identifying kinematic pop- 
ulations and streams) as well as of the physical properties of 
the Galaxy (through dynamical interpretation of the motions). 
Ideally this requires that all six components of phase space (po- 
sitions and velocities) are known for all the stars in the investi- 
gated sample. This can in principle be achieved through a com- 
bination of astrometric data (providing positions and distances, 
from the parallaxes, and tangential velocities from the proper 
motions and parallaxes) and spectroscopic radial velocities. Very 
often, however, such complete data are not available for all the 
stars in a sample. For example, the Hipparcos Catalogue (IESAI 
11997 ) gives the required astrometric information for large sam- 
ples of nearby stars, but not all of them have known radial veloc- 
ities. Restricting the investigation to stars with measure d radial 
velocities could introduce a serious selection bias (Binn ev et al.l 
Il997h . The Gaia mission (Ide Bruiinell2012h will not only pro- 
vide vastly improved astrometric data for much larger stellar 
samples, but also radial-velocity measurements for stars brighter 
than ^ 17 mag; for the fainter stars, however, the phase space 
data will still be incomplete. On the other hand, on-goi ng spec- 
troscopic surveys such as RAVE (ISteinmetz et al.ll2006l) already 
provide radial velocities for large samples without the comple- 
mentary astrometry. Thus we are often faced with the problem 
to estimate kinematical parameters (such as mean velocities and 
velocity dispersions) from incomplete phase space data, lacking 
either the radial or tangential velocity components, or the ac- 
curate distances needed to derive the tangential velocities from 
proper motions. 

IDehnen & BinnevI (Il998l) derive a simple and elegant 
method to derive local stellar kinematics (mean velocity and ve- 



locity dispersion) for a group of stars, when the tangential veloc- 
ities are known, but not the radial velocities. This method has be- 
come popular and can be considered a standard for this purpose. 
We will refer to it as the projection metho d (PM) throughout 
this pa per. In addition to the original work by IDehnen & BinnevI 
(1998'), the same method (or variants of i t) has been used, e.g., 
by Mianard (2000), Brosche et al. (200ll). lvan Leeuw en (200'^, 
and Au mer & BinnevI (|2009). 

Despite its wide usage in the literature, the projection method 
is not founded on any particular estimation principle such as 
maximum likelihood (ML) or Bayesian estimation, but simply 
fit the projected first and second moments of the space veloc- 
ities to the corresponding observed moments of the tangential 
velocities. This works well for large samples, provided that the 
observational uncertainties in the data are negligible compared 
to the uncertainties due to the sampling, but there is no guar- 
antee that it is unbiased, or asymptotically efficient as expected 
for an ML estimate. On the contrary, by neglecting the observa- 
tional errors the resulting velocity dispersions are probably over- 
estimated. Moreover, for small samples the projection method 
may sometimes give unphysical results, in that the estimated dis- 
persion tensor is not positive-definite - implying that the mean 
square velocity is negative in some directions. Since it is desir- 
able that the kinematic parameters can be consistently and effi- 
ciently estimated also for small samples and in the presence of 
non-negligible observational uncertainties, we introduce here a 
new and more rigorous approach, based on the maximum likeli- 
hood method. 



2. Kinematic parameters and the projection method 

For a homogeneous population of stars the phase space density 
/(r, V, t) describes the density of stars as a function of position 



1 



T. Aghajani and L. Lindegren: Maximum likelihood estimation of local stellar kinematics 



(r), velocity (v) and time t. By the local kinematics we mean 
the distribution function here (r - 0) and now (f = 0), that is 
f(v) = f{0,v,Q). It is usually assumed that /(d) is a smooth 
function, and the most common assumption is the Schwarzschild 
approximation, that f(v) is a three-dimensional Gaussian distri- 
bution (velocity ellipsoid), or a combination of a few Gaussian 
distributions. The velocity ellipsoid is completely described by 
the mean velocity v and the dispersion tensor D. 

Throughout this paper we use heliocentric galactic coordi- 
nates (x, y, z) with +x pointing towards the Galactic Centre, +y 
in the direction of rotation at the Sun, and +z towards the north 
Galactic pole. The corresponding heliocentric velocity compo- 
nents are denoted {u^, Vy, v,) or (m, v, w). 

For a stellar population, the mean velocity and dispersion 
tensor are defined as 



v^E[d] , 



D 



(1) 



(2) 



where E is the expectation operator (population mean). Given 
the 3-dimensional velocities i;,, / = 1, 2, . . ., n for a sample of 
stars it is possible to estimate the population mean values by the 
sample mean values. 



D ~ 1 V , D ^ - V(d,- - v)(Vi - . 
n ^ n ^ 



(3) 



Note that this estimate of D is always positive definite (except 
in trivial degenerate cases), but it requires 3-dimensional veloc- 
ities and the dispersions are likely to be overestimated when the 
velocity components have significant observational errors. 
The tangential velocity of a star can be written 



T - Av , 



(4) 



where A is a projection matrix depending only on the position 
of the star. In the projection method the assumption that the po- 
sitions (and hence the matrices A) are uncorrelated with the ve- 
locities is invoked to derive a relation between the mean of r and 
the mean of v, allowing the latter to be solved. In a similar way, 
the elements of D are derived from the relation between the sec- 
ond moments of t and v. For further details we refer to the paper 
by Dehnen & Binney (1998). 

3. Maximum likelihood estimation of the kinematic 
parameters 

3.1. Overview of the model 

We assume a Gaussian distribution of the velocities in all direc- 
tions. That is, the velocity of a star follows a 3-dimensional nor- 
mal distribution, v ~ N(v, £)).[3 Given the position and true par- 
allax p of the star we can calculate its true proper motion in lon- 
gitude and latitude (ni,Hb). Adding Gaussian observational er- 
rors to these we get the observed proper motions jli ~ N(fj.i, cr^), 
fib ~ N(fib,o-jj and the observed parallax p ~ N{p,cPp). The 
observational uncertainties 07, and CTp are assumed to be known. 
The ML formulation requires that the probability density func- 
tion of the observed data (fli,jj.b,p) is written as a function of the 
model parameters. 



' This notation x ~ N(m, V) means that the random variable x fol- 
lows the multi-dimensional normal distribution with mean value m and 
covariance matrix V. Similarly for a one-dimensional normal variable 
X ~ N(m, s~), m is the mean value and the variance. 



3.2. Exact expression for tine likeliliood 

For a problem involving n stars, the parameters of the model are: 

- D = the mean velocity of the stellar population (a 3-vector, or 
3x1 matrix); 

- D - the dispersion tensor of the stellar population (a sym- 
metric 3x3 matrix; contains 6 non-redundant elements); 

- p - the true parallaxes of the stars (an n-vector, or n x 1 
matrix). 

It is necessary to introduce the true parallaxes as formal model 
parameters, although the strategy is that they will be eliminated 
on a star-by-star basis leaving us with a problem with only nine 
model parameters, namely the (non-redundant) components of 
v and D. We denote by the vector the complete set of model 
parameters (i.e., the unknowns to be estimated). For n stars the 
total number of model parameters is n + 9. 

The observables are, for each star / - 1 . . . n, the observed 
components of proper motion, and ilb,i, and the observed par- 
allax pi. The total number of observables is 3n. These have ob- 
servational uncertainties that are given by the 3x3 covariance 
matrix C,-. Sometimes it is useful to denote by the vector x the 
complete set of observables (or data). 

Given the observations, the likelihood function L{ff) = 
L{v,D,p) numerically equals the probability density function 
(pdf) fxix\0) of the observables x, given the model parameters 
0. The objective is to find the ML estimate of 0, denoted 0, i.e., 
the (hopefully unique) set of parameter values that maximizes 
1X0) or, equivalently, the log-likelihood {(0) - In 1(0). The total 
log-likelihood function is given by 



f(v, D,p)^^ [inffijifiilpi) + In giipi - /?,)] 



(5) 



where gr, is the centered normal pdf with standard deviation 

1 /2 

cTp i - [C/lgj , i.e., the uncertainty of the parallax p,. It is clear 
that the parameter p, only affects the ith term in the sum above. 
Therefore, when maximizing with respect to we only need to 
consider that one term. For simplicity we drop, for the moment, 
the subscript ; so that the term to consider (for one star) can be 
written: 



^(u, D,p)^\n fji(fi\p) + In g(p - p) 
= lnff,(fi\p)-^\n(2ncrl)- 



s^2 



(P - P) 



(6) 



where we have used g{x) - (27rcrp" ' exp(-x /2crp for the 
normal pdf with standard deviation cTp. Note that fji{fi\p) of 
course depends on v and D as well, although we focus on 
the dependence on p here. We now need an explicit expres- 
sion for the pdf ffi{fi\p) of the observed proper motion fi when 
the true parallax p is known. This is clearly a two-dimensional 
Gaussian (since both the velocities and the observational errors 
are Gaussian), with expected value 



E\fi\^Ml). 

Here, M is the 2x3 matrix 



M 



P_ 
K 



- sin / cos / 

-cos I sin b -sinZcosZ? cosb 



(7) 



(8) 



projecting any vector (in this case v) to its components in the 
directions of increasing galactic coordinates / and b. The fac- 
tor p/K converts the velocity to proper motion (numerically. 
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K - 4.7405 if the units used are km s"', mas, and mas yr"'). 
Assuming that the observational errors are not correlated with 
the velocities, the covariance of fi is the sum of the covariance 
due to the velocity dispersion and the covariance due to the ob- 
servational errors, i.e.: 



Covlju] =Cf,^ MDM^ 



2 



(9) 



where cr^/ and cr^t are the observational uncertainties in yU; and 
jUi, and p their correlation coefficient. The pdf is then 



-1/2 



exp 



(10) 



Table 1. An example with 100 stars illustrating how the esti- 
mated velocity dispersion depends on the value of a. 



a 


u 


v 


w 






<T„, 


True 


10.000 


15.000 


7.000 


22.000 


14.000 


10.000 


0.0 


10.090 


14.775 


7.083 


21.712 


13.636 


9.783 


0.5 


10.033 


14.808 


7.090 


21.610 


13.638 


9.902 


1.0 


10.032 


14.806 


7.091 


21.429 


13.619 


10.041 


1.5 


10.031 


14.805 


7.093 


21.251 


13.604 


10.182 


2.0 


10.029 


14.804 


7.094 


21.076 


13.591 


10.324 


10.0 


10.015 


14.824 


7.143 


18.587 


13.845 


12.819 



Notes. In this case a = would actually be used (no regularization). 
The solutions for a > were only made to show that the velocity dis- 
persion becomes more isotropic when regularization is used. 



3.3. Eliminating tine parallaxes 

The large number of parameters in the likelihood function in 
Eq. (|5]l makes it difficult to maximize. We approach this problem 
by finding the maximum of Eq. (|6]) with respect to p for each star 
and thus eliminating p. Since the expression for fpifilp) is quite 
complicated, it is not likely that this can be done exactly, by an- 
alytical means, but we can find an approximate solution valid in 
the limit of small relative parallax error, cTp/p «: 1. 

To eliminate p we take the partial derivative of Eq. (|6]l with 
respect to p and set it equal to 0. The result is 



P^P + o-p F{p) , 
where we introduced for brevity 
d\nfp{fi\p) 



F(p) = 



dp 



(11) 



(12) 



So far no approximation has been made. However, in Eq. (ITTl 
the function F is to be evaluated for the value of p which we are 
seeking, so we have not really solved the problem. But if o-p <K p 
we can evaluate F(p) al p - p to obtain the explicit formula {(jj^ jj^ ~ ^ 



3.4. Numerical solution method 

We make use of a numerical method for maximizing the likeli- 
hood function, rather than an analytical one, as the derivatives of 
Eq. (fTSI l become very c omplicated. We have used the Nelder- 
Mead simplex method jLagarias et al.l [19981) . implemented in 
MATLAB as the function fminsearch, to minimize the neg- 
ative of the log-likelihood. The simplex method is useful as it 
does not require us to know the derivatives. 

As soon as there are enough stars in the sample the mini- 
mization works fine but we have found two situations in which 
the algorithm diverges. The first happens when some stars in the 
sample have very small parallaxes or very large tangential veloc- 
ities. The second situation may occur when the sample is very 
small. What happens in both cases is that the estimated disper- 
sion tends towards zero in some direction, apparently because 
the measurement errors are enough to explain the observed dis- 
persion, resulting in a singular dispersion matrix. 

To compute a solution even in these cases, we introduce a 
regularization parameter a > 0. The regularized log-likelihood 
is: 



P + o-p 



d\nff,(fi\p) \ 
dp 



1 



In ff,Aiii\v,D, Pi) +-cr^,jFi(pif 



- aln ■ 



■ p + crlF(p). 



(13) 



This approximation is safe to do under the assumption of small 
relative parallax error, since Eq. (ITJi is obviously correct to first 
order in cr^. Inserting the first-order Taylor expansion 



ln//i(Alp) - ln//i(/i|p) + (p- p)F(p) 



(14) 



in Eq. (|6]l and using p - p - cr^,F(p) from Eq. ( fT3T l. the log- 
likelihood maximized with respect to p becomes: 



{(v,D)^\nfp(fi\p)+:^^pF(p) 



(15) 



Here we have neglected the additive constant -iln(2;rcrp, 
which does not depend on the model parameters and therefore 
is irrelevant for the ML problem. 

Recalling that this is just the log-likelihood term for one star, 
the total log-likelihood function, after maximization with respect 
to p, is therefore: 



(17) 

where S max and S min are the extreme singular values of the sin- 
gular value decomposition of D. ^maxZ-Smin is the square of the 
ratio of the longest and shortest axes of the velocity ellipsoid. 
If a is large the algorithm will tend to favour a small axis ra- 
tio, resulting in an isotropic velocity dispersion in the limit of 
large a. In order to use as little regularization as necessary we 
try Eq. ([TtT i with a = and, if diverging, increase a by steps of 
0.5 until a converged estimate is obtained. The result is a disper- 
sion matrix that is always positive-definite, but when a > it 
may be more isotropic than it should be (Table [T]l. 

The derivative in Eq. (fTST i is calculated numerically, using 
the approximation 

d \nfp{jl\p) ff,(fi\p + o-p) - ffiifilp - o-p) 



dp 



2cr„ 



(18) 



Using a step size of +o-p is logical as we are looking for solutions 
typically within one standard deviation of the observed value. 



(v,D)^Y. 



\nf-^m^,D,pi) + -o^.FiiPif 



(16) 



4. Testing on synthetic and real samples 

We test our method ( ML) together with the pro jection method 
(PM) as described by iDehnen & BinnevI (Il998i) on two types 
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of samples. The first type is a synthetic sample, where nearby 
"stars" with known mean velocity and velocity dispersion have 
been generated. This sample allows us to investigate the bias of 
the two methods. The second type is a sample of real stars taken 
from the Geneva- Copenhagen survey of nearby F and G dwarfs 
jNordstrom et al.l2 004). These stars have measured radial veloc- 
ities which allow us to estimate their mean velocity and velocity 
dispersion directly from Eq. (I3)- We use the resulting estimates 
as the "true" values when comparing the methods. 

4.1. Synthetic samples 
4.1 .1 . Generating the data 

The generation of synthetic stars strictly follows the proba- 
bilistic model outlined in Sect. 13.11 Observational uncertainties 
were set to cr^ = 1, 3, 10 and 30 mas yr"' for the compo- 
nents of the proper motions, and to o-p = 1 mas for the paral- 
laxes. The stars have randomly generated positions {x,y,z) (in 
pc) within a radius of 100 pc from the sun. Each star is also 
given a randomized velocity using a normal distribution with the 
mean parameter v - (10, 15, 7) km s"' and dispersion matrix 
D = diag(22^, 14^, 10^) km^ s"^. From the rectangular coordi- 
nates the galactic longitude / and galactic latitude b are obtained 
with 



/ = atan2(i/, x) , atan2(z, y + y^) . (19) 

Then for any position in sky {I, b) we define three orthogonal 
unit vectors 





COS b cos I 




- sin / 




- sin cos / 


u = 


cos b sin / 


/ = 


cos I 


b = 


- sin b sin I 




sinb 









cos b 



where u is the direction towards the star, and / and b are unit 
vectors in the tangential plane in the sky: +1 in the direction of 
increasing longitude, and +b in the direction of increasing lati- 
tude, [h, /, b] is called the normal galactic triad at the point (/, b). 
The true parallax is given by 

1000 mas 

p= =, (21) 

yx^ + + 

whereupon the true proper motion in longitude and latitude is 
calculated as 

^it = ^I v and iih^ ^b v (22) 

A A 

respectively. Here, K - 4.7405 km s"' kpc"' (mas yr"')"' is the 
numerical constant relating mas yr"' to km s"' kpc"'. 

To get the observed proper motions and parallax, the en^ors 
in proper motion and parallax are added to the true values. Errors 
were drawn from the normal distribution with zero mean value 
and standard deviation equal to the observational uncertainties 
(T/j and cTp. 

As the simulated data set depends on many different param- 
eters it is not possible to investigate the methods for all possi- 
ble combination of the parameters. We restrict ourselves to keep 
most of the parameters fixed (v, D, R and cTp), and only study 
how the methods perform as a function of n, the number of stars 
in the sample, and o-p, the uncertainty in proper motion. We use 
n = 30, 100, 300, 1000 and o-p = 1, 3, 10, 30 mas yr"', which 
gives 16 diff'erent combinations. For each combination 100 sam- 
ples are generated with different seeds for the random generator 



The 100 samples are thus diff'erent in the generated true posi- 
tions and velocities, as well as in the observational errors. The 
PM and ML methods were applied to all 1600 samples and the 
mean values and RMS variations of the estimated parameters 
calculated. 

4.1.2. Results 

With the synthetic sample we can see a difference in how the 
uncertainty in proper motion aff'ects the values of the calculated 
velocities and velocity dispersions for the two methods. 

In the left column of Fig. [1] results from the PM are shown. 
As expected, the dispersions are increasingly biased towards 
higher values when the uncertainty in proper motion increases. 
The eff'ect is negligible when o-p - I or 3 mas yr"', but for 
o-p - 10 or 30 mas yr"' the increase in the dispersions is quite 
significant. This is reasonable since an error of 30 mas yr at 
the median distance of the sample (80 pc), corresponds to a lin- 
ear velocity error of about 1 1 km s"' , which is comparable to the 
true velocity dispersion. As can be seen in the bottom left panel 
of Fig. [T] the dispersion component in w is most severely over- 
estimated (^ 15 km s"' versus the true value 10 km s"'), while 
the u and u components are somewhat less afifected (18 versus 
14 km s"' and 25 versus 22 km s"', respectively). Indeed, in all 
the cases the estimated dispersion (for n > 100) is roughly equal 
to the sum in quadrature of the true dispersion and the linear ve- 
locity error corresponding to the proper motion uncertainty. This 
is a useful rule-of-thumb to estimate the expected bias of the PM 
under more general conditions. A more unexpected result is that, 
for the smallest samples (n = 30), the resulting dispersion ma- 
trix is unphysical (not positive definite) in about 1 % of the cases. 
However, this does not happen with the larger samples. 

The column to the right in Fig. [T] show the results of the ML 
method. We note that: 

1 . Unlike the PM, the ML method is able to give unbiased dis- 
persions even for very large o-p, at least if the sample is not 
too small (n > 100). 

2. For the smallest samples (n - 30) it is often necessary to use 
regularization. In most of these cases it is suflicient to use 
a = 0.5, which does not change the axis ratio much. Only 
in very few cases was a > 3 needed, which gives a nearly 
isotropic dispersion. 

For the smallest sample size both PM and ML tend to underes- 
timate the dispersion, especially along the major axis (m compo- 
nent) of the velocity ellipsoid. 

4.2. Application to the Geneva-Copenhagen survey 

The Geneva-Copenhagen survey dNordstrom et al.ll2004l) is the 
largest and most complete study of a magnitude-limited, kine- 
matically unbiased sample of F and G stars in the solar neigh- 
bourhood. It contains space velocities, Stromgren photometry, 
metallicities, rotation velocities and ages for more than 1 6 000 
stars. The catalogue gives complete (u, u, w) space velocities 
based on Hipparcos data and radial velocities, from which the 
"true" vales of the mean velocity and velocity dispersion can be 
computed using Eq. Q 

4.2.1. Data used 

Not all stars in the Geneva-Copenhagen survey could be used in 
this study. Stars with large relative parallax errors {pi o-p < 10), 
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Fig. 1. Plots illustrating velocity dispersion versus the number of stars, for different values of cr^, as calculated for the synthetic 
samples described in Sect. 14. II Diagrams in the left column are for the projection method (PM), the right column is for the maximum 
likelihood method (ML). The symbols represent the different galactic components of the dispersion: cr„ (•), cr„ (▼), cr„, (a). The true 
values are indicated by the horizontal lines. 



large peculiar tangential velocities (|At| > 200 km s '), or miss- 
ing some of the necessary data were eliminated. The remaining 
7796 stars were divided into 10 bins of approximately equal size, 
sorted by their Stromgren color index b - y. 

4.2.2. Results 

Tableland Fig.|2]give results of both estimation methods (PM 
and ML) using tangential velocities only, as well as from the 
full space velocities (GC). No error overlines are given here as 
the computation of the formal errors has not been implemented 
(cf. Sect. |5]l. It is seen that PM and ML give very similar results 
for all the samples and for all nine kinematic parameters, and 
they both agree well with the "true" values (GC). In the light of 
the experiments with the synthetic samples this is not surpris- 
ing since the observational errors in this case are generally small 



(few mas), resulting in almost negligible tangential velocity er- 
rors. However, it is gratifying to note that the ML performs at 
least as well as the PM in this case. 

5. Conclusions 

Galactic astrophysics is entering a new era in which large-scale 
astrometric, photometric and spectroscopic surveys provide vast 
quantities of data that need to be interpreted in terms of different 
structural, kinematic and evolutionary models. In this context it 
is essential to have at our disposal estimation techniques that can 
deal with incomplete data (e.g., missing radial velocities) and 
that are statistically efficient and reasonably unbiassed. It is also 
important to take the observational uncertainties into account, 
for although some of the surveys are much more precise than 
earlier ones (e.g., Gaia versus Hipparcos), they are also applied 
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Table 2. Results of both the projection method (P M) and our maximum likelihood method (ML) applied to samples from the 
Geneva-Copenhagen survey (iNordstrom et al.ll2004l) . 



b 


-y 


n 


Method 


u 


V 


w 






o-„, 






Pvw 








GC 


-11.86 


-11.94 


-6.75 


23.72 


14.06 


10.52 


0.26 


-0.03 


-0.04 


0.205 


- 0.272 


683 


PM 


-11.79 


-12.36 


-6.31 


22.52 


13.49 


10.41 


0.39 


0.02 


-0.10 








ML 


-11.58 


-12.01 


-6.50 


22.65 


13.24 


9.93 


0.36 


-0.01 


-0.10 








GC 


-12.12 


-12.89 


-7.13 


25.37 


16.33 


12.28 


0.25 


0.03 


-0.01 


0.272 


- 0.296 


719 


PM 


-12.21 


-12.84 


-7.03 


24.38 


17.26 


11.52 


0.34 


0.10 


0.07 








ML 


-12.22 


-12.65 


-7.04 


24.47 


16.23 


11.19 


0.31 


0.10 


-0.06 








GC 


-9.24 


-13.12 


-8.30 


27.73 


16.83 


13.88 


0.27 


-0.03 


-0.02 


0.296 


-0.313 


704 


PM 


-8.99 


-13.53 


-7.02 


27.93 


17.07 


13.96 


0.25 


-0.09 


-0.06 








ML 


-9.49 


-13.19 


-7.01 


28.42 


16.58 


12.53 


0.23 


-0.08 


0.00 








GC 


-9.11 


-15.49 


-7.38 


30.62 


17.83 


14.00 


0.10 


-0.00 


0.05 


0.313 


-0.331 


702 


PM 


-9.46 


-16.51 


-6.66 


29.99 


18.53 


14.28 


0.11 


0.02 


-0.02 








ML 


-9.52 


-16.25 


-7.23 


30.04 


17.93 


13.79 


0.10 


0.04 


-0.01 








GC 


-10.14 


-15.75 


-8.56 


32.49 


19.30 


17.09 


0.09 


-0.15 


0.01 


0.331 


- 0.348 


768 


PM 


-10.65 


-16.48 


-9.57 


30.60 


17.75 


15.60 


0.07 


-0.06 


0.22 








ML 


-10.40 


-15.98 


-9.38 


30.10 


16.84 


15.12 


0.06 


-0.06 


0.15 








GC 


-7.41 


-20.49 


-7.07 


36.75 


27.70 


18.59 


0.14 


-0.07 


-0.03 


0.348 


- 0.370 


777 


PM 


-7.98 


-19.89 


-6.77 


38.22 


26.29 


15.79 


0.02 


-0.13 


0.24 








ML 


-7.01 


-19.37 


-6.86 


32.98 


24.69 


15.77 


-0.03 


0.02 


0.22 








GC 


-9.68 


-23.52 


-7.25 


40.57 


28.75 


21.10 


0.14 


-0.02 


-0.08 


0.370 


-0.391 


820 


PM 


-8.31 


-24.03 


-7.58 


39.55 


31.05 


24.19 


-0.05 


-0.11 


-0.09 








ML 


-8.15 


-23.24 


-7.63 


38.83 


25.98 


22.30 


-0.01 


-0.06 


-0.04 








GC 


-11.39 


-26.93 


-6.96 


42.07 


27.06 


22.00 


0.14 


0.00 


-0.08 


0.391 


-0.416 


835 


PM 


-12.93 


-27.10 


-7.49 


40.93 


24.30 


22.84 


0.25 


0.05 


-0.09 








ML 


-12.45 


-26.90 


-6.99 


40.01 


23.28 


23.16 


0.21 


0.04 


-0.07 








GC 


-13.11 


-27.00 


-7.19 


37.81 


26.82 


20.64 


0.16 


-0.08 


0.04 


0.416 


- 0.454 


865 


PM 


-13.07 


-27.32 


-7.07 


39.52 


27.46 


19.58 


0.09 


-0.12 


0.06 








ML 


-13.09 


-27.28 


-7.02 


38.63 


26.58 


19.99 


0.09 


-0.08 


0.00 








GC 


-14.28 


-24.00 


-8.55 


41.63 


31.18 


21.66 


0.17 


-0.01 


0.09 


0.454 


-0.981 


923 


PM 


-13.32 


-27.13 


-8.52 


43.00 


31.06 


21.29 


0.10 


-0.07 


0.07 








ML 


-13.74 


-26.29 


-8.82 


42.78 


30.00 


19.59 


0.14 


-0.10 


-0.02 



Notes. The stars are divided into color bins according to the limits inh-y in the first column, n is the number of stars in the bin. The third column 
shows the estimation method used. The subsequent columns give the mean velocities Ti, v, w, the velocity dispersions cr„, cr„, (in km s"') and the 
correlation coefficients pij = DijiDgD jjY^ ^^ (where /, / = u, v, w) from the estimated dispersion matrix. GC means using the full space velocities, 
including the radial velocities as given bv lNordstrom et alj ( l2604l) . 



to much more distant stars, making the observational eiTors no 
less important. 

For many stars brighter than 17th magnitude Gaia will pro- 
vide the complete space velocity vector through a combination 
of its astrometric and spectroscopic measurements. However, the 
vast majority of stars observed by Gaia are fainter than 17th 
magnitude and will have no radial velocities. This is also the 
magnitude range where the astrometric errors in parallax and 
proper motion start to become problematic for studies of the 
galactic kinematics beyond the solar neighbourhood. For exam- 
ple, solar-type stars at 5 kpc distance will be observed at appar- 
ent magnitude 18 to 20, depending on the extinction, resulting 
in relative parallax errors of at least 50% and transverse velocity 
errors of several km s"'. The very large number of such stars 
will permit a detailed mapping of their distribution functions 
provided that the statistical biases can be mastered. 

In this paper we have considered a seemingly very simple 
problem, namely to estimate the nine parameters describing the 
velocity distribution of stars under the Schwarzschild approx- 
imation, based exclusively on astrometric data and taking into 
account the observational uncertainities in the parallaxes and 
proper motions. The rigorous application of the maximum like- 



lihood method to this problem turns out to be surprisingly com- 
plex, and we have devised an approximate numerical method to 
solve it. We have tested the method on both synthetic and real 
samples of stellar data, and found tha t it performs slightly bet - 
ter than the simple projection method (iDehnen & Binnevlll998l) . 
especially when the observational eiTors are important and for 
small samples. For very small samples (less than about 30 stars) 
the projection method sometimes gives unphysical results. This 
is avoided with our method, which however may require regu- 
larization for such small samples, leading to a more isotropic 
dispersion tensor 

Several authors have applied maximum likelihood or 
Bayesian methods to estimate stellar kinematics using formu- 
lations that differ in various aspects from our approach. For ex- 
ample, Dehnen ( 1998) describes how general three-dimensional 
velocity distributions can be derived from tangential velocities, 
using the so-called maximum penalized likelihood estimate, i.e., 
a maximum-likelihood estimate constrained to give a smooth ve- 
locity distribution. While this method is very general, it does 
not tak e int o account the obser vational uncertainties. Hog g et aP 
( l2005h and IBovv et al.l (I2009I) derive the parameters for a mix- 
ture of Gaussian distributions from tangential velocity data. 
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Fig. 2. Left. Mean velocities m, v, w versus color index b-y for stars in the Geneva-Copenhagen (GC) survey. Middle. Velocity 
dispersions cr„, (T„, cTu, versus color index b-y for stars in the GC survey. Right. Correlation coefficients pij = DijiDiDjjy^^^ (where 
j = u, u, w) from the estimated dispersion matrix versus color index b-y for stars in the GC survey. Values calculated directly 
from the GC data are shown as a sohd hne. The symbols represent the two different methods used: PM (o), ML (•). 



While they do include the observational uncertainties through 
a linear propagation to the tangential velocities, their formalism 
is quite different from ours in the treatment of the parallax errors. 
It could be interesting to compare the two methods as they both 
depend on (different) approximations for small relative parallax 
errors. 

The present method could be extended and generalized in 
various ways. One obvious extension is to allow that some stars 
have a measured radial velocity instead of, or in addition to the 
astrometric data. Another is to consider a more complex veloc- 
ity distribution, e.g., a superposition of several Gaussian com- 
ponents. It is also desirable to apply a more efficient numerical 
solution technique than the downhill simplex method which was 
chosen for the present experiments for its ease of implemen- 
tation. Alter native methods s uch as a quasi-Newton with finite 
differences dPress et ani2007l) would almost certainly be much 
more efficient. We note that the computing time scales linearly 
with the number of stars (for a fixed number of kinematical pa- 
rameters), which makes it feasible to apply the ML method also 
to much larger samples than in our experiments. Finally, the for- 
mal errors (or more generally confidence regions) of the esti- 
mated quantities need to be computed, which could be done with 
moderate effort by mapping the log-likelihood function around 
its maximum. 

Acknowledgements. We wish to thank the anonymous referee for several useful 
suggestions which helped to improve the manuscript. 
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